Why simple models do not remove complexity; they relocate it.
There is a particular kind of optimism hidden in a two-state enum: paid or unpaid.
Clean. Easy to explain. Looks excellent in a product requirements document. You can draw it on a whiteboard, and nobody asks follow-up questions. That is usually the suspicious part.
Then the payment provider arrives with authorization, capture, partial capture, settlement, failed settlement, refunds, disputes, chargebacks, reversals, expired authorizations, manual adjustments, and the small administrative tragedy of "the customer’s bank says one thing and our database says another."
The enum did not simplify the world. It only gave the world fewer places to fit.
And the world, being rude, kept all its complexity anyway.
The model with two chairs
Following the same scenario, imagine a payment system with this model:
class PaymentStatus(StrEnum):
UNPAID = "unpaid"
PAID = "paid"This is not obviously stupid. In fact, it may be exactly right for the first version of the product. A customer either owes money or does not. The order is either blocked or fulfilled. The invoice is either open or closed. Beautiful.
For about fifteen minutes.
Then someone asks whether a successfully authorized card payment is paid before capture. Someone else asks what happens when capture succeeds but the money does not settle or reconcile as expected. A customer disputes a payment. Finance wants to know whether last month’s revenue needs to be recognized, reversed, deferred, or adjusted after a refund. Support needs to re-open an order that was paid, refunded, and then manually corrected after duplicate, delayed, or out-of-order provider webhooks during an outage.
Suddenly paid and unpaid are not states. They are vibes.
The system still has to respond to all these situations. The question is where those responses live. Maybe in the domain model. Maybe in scattered conditionals. Maybe in admin tools. Maybe in a spreadsheet called payment_fixes_final_v3.xlsx, quietly passed from one support engineer to another like a cursed family recipe.
But the responses have to exist somewhere.
Cybernetics gives us a useful name for why.
Only variety can destroy variety
Cybernetics is the study of control, communication, regulation, and feedback in systems. It has a slightly dusty name now, partly because it sounds like something a 1970s research lab would put on a door next to a room-sized computer. But the ideas are annoyingly useful.
Norbert Wiener gave the field its name in the 1940s. For software design, though, the person I keep coming back to is W. Ross Ashby. In An Introduction to Cybernetics, Ashby gives us the Law of Requisite Variety:
Only variety can destroy variety.
That sentence sounds like it was written by someone trying to win a philosophy argument in a pub. But the idea is practical. Here "destroy" means reduce or absorb: the regulator needs enough possible moves to keep environmental variation from becoming uncontrolled variation in the outcome.
A regulator, sometimes called a controller in cybernetics and control theory, needs enough distinguishable observations and possible responses to handle the disturbances that matter. This is not an MVC controller. It is the part of a control loop that observes what is happening and acts to keep things within some desired range. If the environment can vary in ten important ways, and your regulator can only vary in two, the remaining eight do not politely disappear. They leak.
Take a thermostat. The room gets hotter or colder. The thermostat senses temperature, compares it to a set point, and can turn heating on or off over time. For that tiny world, the controller has enough variety.
But imagine using the same thermostat to manage a data center where the problem might be heat, humidity, power draw, rack-level airflow, failing fans, regional energy prices, or a fire alarm. On/off heating is not enough variety. You do not have a regulator. You have a decorative switch with confidence.
Software has this problem wherever the environment has more meaningful cases than the model has responses.
In payments, the environment has refunds, chargebacks, disputes, reversals, failed settlements, expired authorizations, provider outages, duplicate webhooks, and humans making manual corrections under pressure. Your model has paid and unpaid. Good luck.
The missing variety leaks
The important thing about missing variety is that it does not announce itself as a modeling problem. That would be too kind.
It appears as a bug report:
Customer was refunded but order still shows as paid.
It appears as a conditional:
if payment.status == PAID and payment.refunded_at is not None:
...Then another:
if payment.status == PAID and payment.dispute_id is not None:
...Then a helper function:
def is_really_paid(payment):
...Then a second helper function with a more honest name:
def is_paid_for_fulfillment_but_not_for_accounting(payment):
...At some point, the codebase is no longer using the model. It is negotiating with it.
This is how insufficient variety turns into operational debt. The domain distinction still matters, but because the model has nowhere to put it, every caller has to rediscover it locally. Fulfillment invents its version of payment truth. Accounting invents another. Support has a third version, encoded in an admin workflow that everyone is afraid to touch because it fixed a VIP customer issue in 2022.
The system becomes complicated in exactly the place where the model tried to stay simple.
There is a lesson here that is easy to say and hard to practice: simplicity is not the same as having too few distinctions. A good model removes distinctions that do not matter. A bad model removes distinctions that absolutely matter and then discovers them again as outages, exceptions, and meetings with finance.
The world does not care about your enum
One of the most dangerous phrases in software design is:
For now, we only need...
Not because it is always wrong. It is often right. Starting with the smallest useful model is usually good. Please do not build a payment-status taxonomy with thirty-seven states on day one because you read half an essay about cybernetics and got excited. The danger is when "for now" silently becomes "forever," while the environment keeps changing.
A two-state model may be fine when payments are synchronous, refunds are manual, disputes are out of scope, and finance reporting is someone else’s problem. But when those things enter the system, the model has to change too. If it does not, the system compensates by growing side channels.
Boolean flags are a common symptom:
paid = True
refunded = True
disputed = True
settled = False
manually_adjusted = TrueThis is not necessarily worse than an enum. Sometimes a set of orthogonal facts is exactly the right model. The problem is not flags. The problem is accidentally building a state machine out of flags while pretending you still have a boolean.
Because now you have to answer questions like:
- Can a payment be refunded, voided, or reversed before funds settle?
- Can it be disputed after a partial refund?
- Can a chargeback be reversed?
- Can fulfillment proceed after authorization but before capture?
- Does accounting recognize revenue on fulfillment, capture, settlement, or some separate rule?
- What does support see while all of this is happening?
Those questions are not implementation details. They are the shape of the business.
If the business has to respond differently in each case, the system needs enough variety to represent those cases, observe them, and act on them. Otherwise the system will compensate outside the model.
Feedback is part of the model
Requisite variety is not only about having enough states. It is also about knowing which state the world is actually in.
A regulator works by sensing what is happening, comparing it to some desired condition, and correcting when reality drifts. Sense, compare, correct. That loop is so simple it almost sounds useless, which is how you know it is probably fundamental.
A payment system that cannot detect failed settlements, compare them to expected outcomes, and trigger correction is not really regulating payments. It is recording hopeful events.
A deployment system that cannot observe whether the new version is healthy is not really regulating deployment. It is copying files and hoping the story ends well.
An AI coding agent that cannot tell whether its change actually fixed the bug is not really closing the loop. It is producing text near a repository.
Feedback is where the model touches reality. Without it, the system can only believe its own last action. It sent the capture request, so the payment must be captured. It deployed the service, so the service must be running. It generated the patch, so the tests must pass. This is not engineering. This is manifestation.
The Law of Requisite Variety applies here too. It is not enough to have one generic error state called failed. Failed how? Failed permanently? Failed because the provider timed out? Failed because the request was invalid? Failed because the customer’s bank declined it? Failed because our webhook handler crashed after processing the event but before committing the database transaction, which is the kind of sentence that ruins an afternoon?
Different disturbances require different responses. Retry. Ask for a new payment method. Mark for manual review. Reverse fulfillment. Reconcile later. Page someone. Ignore because it is a duplicate webhook and the only emergency is that our logs are dramatic.
If all failures collapse into one bucket at the point of decision, the controller cannot choose the right response without rediscovering the missing distinction somewhere else. The distinction was collapsed by the model, so the variety reappears as human judgment, usually at the worst possible time.
Not every distinction deserves a throne
There is a trap in this argument. Once you see missing variety everywhere, you may want to model everything. Do not do that.
The point of Ashby’s law is not "add more states." The point is that a regulator needs enough variety for the disturbances it must actually handle. Not maximum variety. Enough variety.
Some distinctions belong in the core domain model. Some belong in an event log. Some belong in metrics. Some belong in a reconciliation process. Some belong in a support note and should never be promoted into architecture, no matter how emotionally vivid the incident was.
A payment provider may expose forty-seven event types. Your business probably does not need forty-seven first-class payment states. But it may need to distinguish authorized from captured, captured from settled or funded, fully refunded from partially refunded, and an open dispute from a won, lost, or reversed chargeback. Not because those words are fancy. Because the system acts differently.
Often the answer is not one larger PaymentStatus enum, but several owned facts: authorization state, capture state, settlement or funding state, refund state, dispute state, fulfillment eligibility, accounting treatment, and provider event history.
This also does not mean copying the provider’s model into your own. That is another way to lose control. Provider events and states are evidence about the world, not automatically your domain vocabulary. The useful move is usually to preserve the provider’s raw events at the edge, translate them through a deliberate anti-corruption layer into internal facts your system owns, and derive business decisions from those facts. The internal facts may still preserve provider-specific details for audit and reconciliation; they just should not make the provider’s vocabulary your core domain model. Otherwise every provider migration becomes a vocabulary crisis with invoices attached.
A useful test is:
Would we make a different decision if we could represent this distinction?
If yes, the distinction may deserve a place in the model.
A related test:
Which component is allowed to make that decision?
If fulfillment, finance, support, and risk all answer differently, you may not have one payment status. You may have several decision-specific projections over the same payment history.
Another useful test:
Are humans already handling this distinction outside the system?
If support, finance, or operations has a manual playbook for a case your model cannot represent, that is not automatically a design failure. But it is evidence. The organization may have discovered a real distinction before the software did.
This is where good modeling gets annoying. You are not trying to mirror reality in full. Reality is too large and badly documented. You are trying to preserve the distinctions required for control. That is a smaller goal. Still hard, unfortunately.
AI agents and insufficient variety
The same pattern shows up outside payments.
AI coding agents can make bad models worse because they tend to amplify the visible structure of the codebase. If the real payment lifecycle lives in support memory, incident notes, and scattered conditionals, the agent sees the debris, not the control loop.
So it continues the local pattern: another flag, another branch, another helper, another special case under the existing special case.
The useful question is not only whether the agent can write code. It is whether the agent-and-workflow have enough variety to handle the problem. For domain-heavy work, that may mean multiple candidate models, real edge cases, tests that encode business distinctions, tools that can observe outcomes, and feedback from the system after the change. If the workflow only exposes one local pattern in context, expect that pattern to win, even when it is wrong.
The question worth asking
The Law of Requisite Variety gives software design a practical question:
What variety does the world throw at this system, and where does our model absorb it?
The blunter version:
What cases are we pretending do not exist because our model has nowhere to put them?
Ask it about payments. Ask it about subscriptions. Ask it about deployments. Ask it about permissions, because permissions are where simple models go to be publicly humiliated. Ask it about AI agents, especially the ones expected to operate across messy repositories with partial context and ambiguous goals. Then look for the leaks.
Bug reports are leaks. Manual playbooks are leaks. Boolean fields with names like is_special_case are leaks. Admin buttons that only one person understands are leaks. Reconciliation scripts can be leaks, or they can be the place you deliberately put the variety.
The point is not to eliminate all of them. Some operational work belongs outside the model. Some weirdness is rare enough that a manual process is cheaper than a permanent abstraction. But you should know which trade you are making.
Because if the variety matters, and your system cannot absorb it, the organization will absorb it instead. Usually in people.
Where this leaves us
The world does not become simpler because your model is simple. If the business can produce refunds, disputes, reversals, and failed settlements, then those cases exist whether your enum has room for them or not.
A good model does not copy the world in all its awful detail. It preserves the distinctions the system needs in order to respond. That is the cybernetic lesson hiding inside the bug report: if your software does not have enough variety for the world it regulates, the missing variety will still be handled. Just badly.